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This invention relates to voice packet playout schemes and in particular to an 



Adaptive Predictive Playout Scheme for Packet Voice Applications. 

Background of the Invention 

The revolution in high-speed communication networks, an example of which is the 

10 Internet, has given rise to the potential for enabling the deployment of multimedia 
applications. These applications, however, require stringent quality of service (QoS) 
guarantees, such as bounded delay and jitter. The current Internet was originally designed 
to offer best effort service without any QoS guarantees. In such a packet switching 
environment, the delay of each packet varies greatly due to the complexities of the 

15 network traffic and to the traffic scheduling algorithms implemented for efficient 
utilization of bandwidth. Voice data or speech packets are generally considered to be 
transported at a variable bit rate (VBR). As a result, the problem of unbounded jitter, 
introduced by the networks, often renders the speech unacceptable or even unintelligible. 
It thus becomes essential to offer control mechanisms to obtain distinctive QoS 

20 guarantees. 

Essentially, voice applications can be broadly classified as either interactive or 
unidirectional. Serving dissimilar purposes, these two classes of applications differ in 
playout delay requirements and the tolerances for playout impairment. Interactive voice 
applications are more sensitive to playout delay than playout impairment due to their real-' 

25 time nature. It is therefore acceptable in interactive voice applications to trade some 
playout impairment for better playout delay. 

Methods of buffering packets at the receiver end have been extensively studied. 
Such prior art methods include I-Policy and E-Policy [W.E. Naylor and L. Kleinrock, 
"Stream Traffic Communication in Packet Switched Networks: Destination Buffering 

30 Considerations", IEEE Transactions on Communications, Vol. COM-30, No. 12, Dec 
1982; and D.L. Stone and K. Jeffay, "An Empirical Study of Delay Jitter Management 
Policies", Multimedia System, pp.267-279, Vol. 2, No.6, Jan 1995]. However, these 
schemes do not adapt to traffic conditions, such as delay and jitter, which may vary from 
time to time. Adaptive playout schemes have also been proposed based on an assumption 
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that the level of traffic conditions like delay jitter for the near future can be estimated in 
terms of the observed level in the recent past [D.L. Stone and K. Jeffay, "An Empirical 
Study of Delay Jitter Management Policies", Multimedia System, pp.267-279, Vol. 2, 
No.6, Jan 1995]. 

5 It is therefore an aspect of an object of the invention to provide a control 

mechanism for improving the utilization of resources and for optimizing service 
performance. 



Summary of the Invention 

10 According to an aspect of the invention, there is provided an adaptive predictive 

playout scheme, based on a Least Mean Square (LMS) prediction algorithm, for packet 
voice applications. The packets are received and stored in a buffer for playout at a 
constant draining rate P0, where P0 is determined by the codec used. 

When the number of packets in the buffer is greater than L0, the arrival interval of 

15 the next incoming packet is predicted based the LMS prediction algorithm. If the 

estimated arrival interval for the next packet is smaller than a draining threshold DO, then 
the next packet is predicted to arrive at the destination relatively early and thus is predicted 
to be buffered relatively longer than the previously received packets. However, if the 
oldest packet in the buffer is discarded, then the latency (time of packet in the buffer) of 

20 the next packet is expected to be reduced, but without increasing the probability of causing 
a gap as there are a number of packets in buffer queue. 

If, however, the prediction for the next packet arrival interval is greater than the 
draining threshold DO, then this next packet is expected to arrive at a time when all of the 
packets have been played out, thus no packets are needed to be discarded. With such a 

25 prediction, the receiver continues to play out the remaining packets provided that the 

maximum acceptable playout latency is not exceeded. After the playout of the last packet 
of the talkspurt, or in the event that no packet has arrived for some time since the arrival of 
the last packet, the talkspurt playout is finished. The receiver starts or resets to playout the 
next talkspurt. 

30 

Brief Description of the Drawings 

In the accompanying drawings: 
Figure 1 is a time-line diagram illustrating voice source behavior; 



Figure 2 is a block diagram illustrating a linear predictor according to the invention; 
Figure 3 is a block diagram illustrating an adaptive linear predictor according to the 
invention; 

Figure 4 is a flowchart illustrating LMS prediction algorithms according to the invention; 
Figure 5 is a block diagram illustrating an adaptive prediction playout mechanism utilizing 
LMS prediction algorithms of Figure 4; and 

Figure 6 are flowcharts illustrating an adaptive predictive playout scheme in accordance 
with Figure 5. 

Detailed Description of the Preferred Embodiments 

For voice data as shown in Figure 1, during a talkspurt of duration 1/a packets of 
speech are generated at fixed intervals T. During silence periods, no packets are 
generated. At the receiver end, the received constant-size voice packets are played out at a 
constant bit rate. 

Talkspurts of speech are of relative short duration (1/a). At the receiver end, the 
packet arrival intervals for talkspurts are assumed to be statistically stationary. 
Consequently, a LMS prediction algorithm can be used to predict the packet arrival 
intervals. 

Thus, where x(t) (t=0,l,2...) denotes a series of packet arrival intervals, the 
problem of voice packet arrival interval series prediction involves predicting the value of 
x(t + /) from the known x(t - n + 1), x(t — n + 2),..., x(t) where x(n) is the most recently 
received packet. When 7=1, this process is referred to as one-step prediction. The well 
known least mean square (LMS) error linear prediction is based on Wiener-Hopf 
equations, whereby a k-step linear predictor predicts x(n + k) using a linear combination 
of the current and previous values of x(n) . Thus, the /?th-order linear prediction is obtained 
by the following equation: 



where w(l) are the prediction filter coefficients, for /=0,l,2,...,p-l. A linear predictor is 
illustrated in Figure 2 where 




(3.1) 



w = [wfo;,w(i),...w(/>-i)] r 

x(«) = [*(«), x(n - l),...x(« -p + l)f 
e(n) = x(n + k) - x(n + k) 



(3.2) 
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From equations (3.1) and (3.2), 

e(n) = x{n + k)- w r x(«) (3.3) 
The optimal linear predictor in the mean square sense is the one that minimizes the mean 
square error £ where 

£ = E{e(n) 2 } (3.4) 
Since £ is a quadratic function, it has a unique minimum. Therefore, the vector w that 
minimizes £ is found by taking the gradient of £ setting it equal to zero, and then solving 
for w 

V£ = 0 

V£ = VE{e(n) 2 } = -2E{e(ri)x(n)} = 0 
Substituting the value for 

V£ = -2E{[x(n + £)-w r x(rt)]x(rt)} = 0 

Then 

E{x{n + *)x(n)} = £{[w r x(*)]x(*)} (3.5) 
If x(h) (n=0,l,2...) is wide-sense stationary, the correlation between x(n)and x(n + k) is 
only a function of k, r x (£) . 

r z (*) = E{x(n + ^)jc(«)} (3 .6) 



From the left side of equation (3.5), 

r x {k) 

r 

20 E{x(n)x(n + k)} = 



= r(k) 



r x (k + p-Y) 
From the right side of equation (3.5), 

E{[w T x(n)]x(n)} = £{x(«)x(«) r }w r = 



r,<JP-V 
r x (P~2) 



w T =R w 7 



r,U>-l) r x (p-2) r,(0) 
where w is the vector of coefficients, R^is a px p Hermitian Toeplitz matrix of auto- 
correlations, and r(k) is the vector of cross-correlations between predicted value x(n+k) 
25 and x(«) . 
Thus, 



* « « 



R x w'=r(*) 



(3.7) 



The equations in (3.7) are the Wiener-Hopf equations for linear prediction. For a 
one-step prediction (&=1), the set of linear equations in (3.6) are equivalent to the set of 
5 linear equations used to fit a /?th-order autoregressive (AR) process with the exception of a 
minus sign. The solution to the equations in (3.6) requires knowledge of the auto- 
correlation of x(n), and it also assumes that x(n) is wide sense stationary, i.e., the mean, 
variance, and auto-co variance of x(n) do not change with time. It also requires inverting 
R x whose size depends on the order of linear predictor p. 

10 LMS for prediction does not require any prior knowledge of the auto-correlation of 

a sequence. Therefore, it can be used as an on-line algorithm to predict time intervals. A 
signal diagram of an adaptive linear predictor is shown in Figure 3. The prediction 
coefficients w(n) are time-varying. The errors, {e(n)} are fed back and used to adapt the 
filter coefficients in order to decrease the mean square error. As time progresses, a p 

15 number of the latest x(n) is captured to predict the value of x(n+l), in the manner of a 
sliding window over a timeline to predict the next value in terms of a few of the latest 
values. 

The steps of a LMS prediction algorithm according to the invention are: start with 
an initial estimate of the filter (prediction) coefficients w(0); and for each new data point, 
20 compute V£, where 

V£ = -2E{e(n)x(n)}. 

In practice, the statistics are not known and may change with time. Therefore, the 
expectation operator E is replaced with an estimate. The simplest estimate is the one point 
sample average e(n)x(n). The V£ is then used to update w(n) by taking a step of size 
25 0.5 ju (/u is an adaptation constant for adjusting the prediction errors) in the negative 
gradient. The update equations for the LMS filter coefficients are: 
w(« + 1) = w(«) - 0.5//V£ 

w(« + 1) = w(«) + jje{n)x{n) (3 .7b) 

If x(n) is stationary, w(n) converges to the mean of the optimal solution R X W = r(k) . 
30 The LMS thus converges in the mean if \<\//u< 2/X max , where A max is the maximum 
eigenvalue of R x . 
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According to the invention, a normalized LMS (NLMS) is a modification to the 
LMS algorithm where the update equation is: 

W („ + l) = w< n) + ^«> (3.8) 

where ||x(w)|| 2 = x(n) r x(n). NLMS has the advantage over LMS of less sensitivity to the 
5 step size ju . Using a large ju results in a faster convergence and quicker response to signal 
changes. However, after convergence, the prediction parameters have larger fluctuations. 
On the other hand, using a small fx results in a slower convergence, but smaller 
fluctuations after convergence. There is a tradeoff between faster convergence versus 
smaller fluctuations. 

10 A flowchart of LMS prediction according to an embodiment of the invention is 

shown in Figure 4, the steps are: step 400, at the start of a talkspurt, n=0, an initial w(n) is 
estimated; step 405, a packet is received and a packet arrival interval x(n) is obtained; step 
410, the next packet arrival interval x(n+l) is predicted or calculated; step 415, another 
packet is received and the next packet arrival interval x(n+l) is obtained; step 420, the 

15 error e(n) is calculated using equation (3.3); step 425, an update coefficient w(n+l) is 
calculated using equation (3.8) where the Normalized LMS prediction algorithm used; 
step 430, the LMS prediction algorithm to calculate x(n) is updated with the parameters 
w(n+l) and x(n+l), and the last interval parameters w(n-p+l) and x(n-p+l) are dropped; 
and step 435, go to step 410 until the talkspurt ends. 

20 According to another embodiment as also shown in Figure 4, equation (3.7b) is 

substituted for equation (3.8) in step 425 where the LMS prediction algorithm is used. 

An adaptive predictive playout mechanism, based on LMS prediction of Figure 4, 
is shown in Figure 5. It is composed of three components: 1) a smoothing buffer 10, 2) an 
LMS traffic predictor 12, and 3) a CBR (Constant Bit Rate) player 14. The arriving 

25 packets are queued in the smoothing buffer 10. LMS predictor 12 employs an online 
algorithm as shown in Figure 4, using the normalized LMS prediction algorithm, to predict 
the arrival interval of next incoming packet. Based on the predicted packet arrival interval, 
the CBR player 14 derives an adaptive buffer delay by means of discarding the oldest 
packets in the buffer if necessary. 

30 The first few packets of each talkspurt are buffered to smooth the jitter. There are 

two conditions for starting playout of packets: current buffer length Q is greater than the 
buffer threshold L0, and queuing time of the oldest packet in buffer B is greater than the 
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maximum acceptable playout latency TO. Whenever either of these two conditions is met, 
the CBR player 14 starts playout of packets at a constant bit rate. 

During the playout of the packets at a constant draining rate PO, where PO is 
determined by the codec used to encode the talkspurt into the packets, when the number of 
5 packets in the buffer is greater than LO, the arrival interval of the next incoming packet is 
predicted by the LMS predictor 12. If the estimated next arrival interval is smaller than the 
draining threshold DO, then this packet is predicted to arrive at the destination relatively 
early and that this packet is predicted to be buffered relatively longer than the previously 
received packets. If the oldest packet in the buffer is discarded, the latency of the next 

10 incoming packet is expected to be reduced, but without increasing the probability of 
causing a gap as there are a number of packets in buffer queue. If the prediction of the 
packet arrival interval is greater than the draining threshold DO, then this packet is 
expected to arrive at a time when all the packets have been played out, thus no packet 
needs to be discarded. With such a prediction, the receiver continues to play out the 

15 remaining packets provided that the maximum acceptable playout latency is not exceeded. 
After the playout of the last packet of the talkspurt, or no packet has arrived for some time 
since the arrival of the last packet, the talkspurt playout is finished. The receiver starts or 
resets to playout the next talkspurt. 

Flowcharts of the operation of the adaptive predictive playout mechanism of 

20 Figure 5 are shown in Figure 6. The parameters B, TO, PO, DO, LO, and Q, for the 
mechanism are also shown. The steps of the operation are: step 600 waiting for a 
talkspurt; step 610, receipt of a new talkspurt; step 620, initial smoothing of the packets of 
the talkspurt, which comprises receiving packets 622 and holding the packets in the 
smoothing buffer 1 0 until the current buffer length Q reaches the threshold L0 or B (= 

25 Queuing time of the oldest packet in buffer) is greater than TO (= Maximum acceptable 
playout latency) 624; and playout 640 of the packets in the buffer 10. 

The playout 640 comprises step 642 to playout the oldest packet in buffer 1 0 with a 
constant draining rate P0 as determined by the codec used to encode the packets; step 644, 
the buffer length Q is checked to determined if the last packet in the buffer 10 has been 

30 playout and if played out then go to step 600 to wait for the next talkspurt; if not played 
out then go to step 642 to playout the next packet. 

As the packets are being played out, further packets are also being received and 
added to the buffer 10 (step 646). For each received packet, the LMS predictor 12 is 
updated accordingly to the normalized LMS prediction algorithm (step 648) and the buffer 
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length Q is checked to determine if Q is below the buffer threshold LO (step 650). If Q is 
greater than LO then predict a next incoming packet arrival interval d (step 652). The 
interval d is compared (step 654) with the draining threshold DO to control possible 
flooding where d is not greater than DO, and also to insure that the maximum playout 
latency TO still remains acceptable where B is not greater than TO. If either of the 
conditions in step 654 is not satisfied then discard the oldest packet in the buffer 10 (step 
656). 

Various simulation scenarios have been tested using simulations of the adaptive 
prediction playout scheme of the invention. Without limiting the scope of the invention, 
the results of the estimated probabilistic QoS values (delay, delay jitter, loss and gap 
probabilities) within the range of specified operating parameter values are provided herein 
below. With parameter values specified as follows : 

• Exponential packet arrival with the mean varying between 1.5 ms and 3 ms, 

• Buffer threshold LO of 50, 55, 60, 75 or 1 00 packets, 

• Maximum acceptable playout latency TO of 1 50ms, 

• Draining threshold DO of 6ms, 

• Constant packet draining rate P0 of 1 packet every 1 .5 ms or 3 ms, 

• Packet length of 1 024 bits, 

• Prediction step size ju of 0.05, and 

• Sliding window size p of 1 , 3, 5 or 10; 
it was observed that, 

• As the sliding window size increased, the packet gap or lost probabilities (varying 
between 0.3% and 1.1%) increased, with the most drastic deterioration occurring 
when the window size jumped from 1 to 3; increasing the buffer threshold decreased 
the values of gap or lost probabilities, which are annoying to voice users when they 
are too high; 

• As the buffer threshold increased, the mean of queuing delay (varying between 80 
ms and 148 ms) also increased proportionally; decreasing window size improved the 
delay with a very strong improvement occurring when window size was reduced 
from 3 to 1 ; and 

• Delay jitter statistics were not collected in the initial experimentation due to the fact 
that their impact on voice QoS is accounted for by the packet lost or gap 
probabilities, and the packet draining rate (which was constant). 
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Although preferred embodiments of the invention have been described herein, it 
will be understood by those skilled in the art that variations may be made thereto without 
departing from the scope of the invention or the scope of the appended claims. 



10 



WHAT IS CLAIMED IS: 

1 . An apparatus for adaptive prediction playout of a talkspurt, the talkspurt 
comprising a series of packets received by the apparatus, the apparatus comprising: 

a buffer for buffering received packets of the talkspurt where each packet has a 
latency time in the buffer; 

a LMS predictor using a Least Means Square algorithm for calculating a predicted 
next packet arrival interval after receiving each packet of the talkspurt to predict 
when a next packet will be received; and 

a constant bit rate player for playing out the packets in the buffer at a substantially 
constant rate; 

whereby the packet having the greatest latency in the buffer is discarded when the 
predicted next packet arrival interval is less than a draining threshold so that the 
latency of the packets in the buffer is controlled. 

2. The apparatus of claim 1, wherein the constant bit rate player starts to play out the 
packets in the buffer on first occurrence of one of the number of packets in the 
buffer exceeding a predefined buffer threshold and the packet in the buffer having 
the greatest latency exceeding a predefined maximum acceptable playout latency. 

3. The apparatus of claim 1 or 2, wherein calculating predicted next packet arrival 
intervals comprises: 

(a) selecting an initial set of prediction filter coefficients w(l) where / = 0, 1, 
p-\ at the start of the talkspurt; 

(b) calculating one predicted next packet arrival interval x A (n+l) after 
receiving an n-th packet using prediction equation 
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x(n + 1) = 2^ w(/)jc(« - /) where x(n), x(n-l), x(n-2), x(n-p+l) 
/=o 

denotes a series of p received packet arrival intervals; 

(c) receiving the next packet and measuring the next packet arrival interval 

(d) calculating a prediction filter coefficient w(n+l) for x(n+l) from the least 
mean square error of the difference between x(n+l) and x A (n+l)\ 

(e) updating the prediction equation x^(n+l) in step (b) by adding w(n+l) and 
deleting the w(p-l) and incrementing n by one for calculating the predicted 
next packet arrival interval; and 

(f) repeat (b) to (e) after receiving each packet until the talkspurt ends. 

The apparatus of claim 3, wherein the prediction filter coefficient w(n+l) for 
x(n+l) is calculated from the least mean square error of the difference between 
x(n+l) and x*(n+l) with a weighting function to reduce the effect of the next 
packet arrival interval x(n+l) as compared to earlier packet arrival intervals. 

The method of claim 4, wherein the prediction filter coefficient w(n+l) is w(n) + 
fie(n)X(n) where e(n) is x(n+l) less x*(n+l), X(n) is [x(n), x(n-l), ... x(n-p+l)] T , 
and fi is a predefined step size for adjusting prediction error e(n) and which fi is in 
the range 

1 < < 2/A, max , where /l max is the maximum eigenvalue of R x , where R x 
is the autocorrelation of the vector x. 

The method of claim 5, wherein the prediction filter coefficient w(n+l) calculated 
from a normalized least mean square algorithm where 



12 



w(w + l) = w(n) + ^^ and ||^)|| 2 JT(ii) 

A method of adaptive prediction playout of a talkspurt, the talkspurt comprising a 
series of packets as received, the method comprising: 

buffering received packets of the talkspurt where each packet has a latency in the 
buffer; 

using a Least Means Square algorithm for calculating a predicted next packet 
arrival interval after receiving each packet of the talkspurt to predict when a next 
packet will be received; and 

playing out the packets in the buffer at a substantially constant rate; 

whereby the packet having the greatest latency in the buffer is discarded when the 
predicted next packet arrival interval is less than a draining threshold so that the 
latency of the packets in the buffer is controlled. 

The method of claim 7, wherein the constant bit rate player starts to play out the 
packets in the buffer on first occurrence of one of the number of packets in the 
buffer exceeding a predefined buffer threshold and the packet in the buffer having 
the greatest latency exceeding a predefined maximum acceptable playout latency. 

The method of claim 7 or 8, wherein calculating predicted next packet arrival 
intervals comprises 

(a) selecting an initial set of prediction filter coefficients w(l) where / = 0, 1, 
p-\ at the start of the talkspurt; 

(b) calculating one predicted next packet arrival interval x*(n+J) after 
receiving an n-th packet using prediction equation 
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x(n + 1) = 2^ w(/)x(« - /) where x(n), x(n-l), x(n-2), x(n-p+l) 

1=0 

denotes a series of p received packet arrival intervals; 

(c) receiving the next packet and measuring the next packet arrival interval 

(d) calculating a prediction filter coefficient w(n+l) for x(n+l) from the least 
mean square error of the difference between x(n+l) and x*(n+J); 

(e) updating the prediction equation x*(n+l) in step (b) by adding w(n+l) and 
deleting the w(p-l) and incrementing n by one for calculating the predicted 
next packet arrival interval; and 

(f) repeat (b) to (e) after receiving each packet until the talkspurt ends. 

The apparatus of claim 9, wherein the prediction filter coefficient w(n+l) for 
x(n+l) is calculated from the least mean square error of the difference between 
x(n+l) and x*(n+l) with a weighting function to reduce the effect of the next 
packet arrival interval x(n+l) as compared to earlier packet arrival intervals. 

The method of claim 10, wherein the prediction filter coefficient w(n+l) is w(n) + 
fte(n)X(n) where e(n) is x(n+l) less x*(n+l), X(n) is \x(n), x(n-l), ... x(n-p+l)] T 9 
and ft is a predefined step size for adjusting prediction error e(n) and which /u is in 
the range 

1 < 1//J < 2/A max , where A max is the maximum eigenvalue of R x where R x 
is the autocorrelation of the vector x. 

The method of claim 11, wherein the prediction filter coefficient w(n+l) calculated 
from a normalized least mean square algorithm where 
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w(« + l) = w(«) + ^^M and l^l 2 =X (nYX(n) 
\\X(n)f 



15 



Abstract 



An adaptive predictive playout scheme, based on a Least Mean Square (LMS) 
prediction algorithm, for packet voice applications. The packets are received and stored in 
a buffer for playout at a constant draining rate PO, where PO is determined by the codec 
used. The latency of the packets in the buffer is controlled by discarding the oldest packet 
in the buffer when the predicted time interval for receipt of the next incoming packet is 
less than a draining threshold. 
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Figure 1 Voice source behavior 
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Figure 2 Linear predictor 
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Figure 3 Adaptive linear predictor 
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Figure 4 Flowchart of LMS prediction algorithm 
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Figure 5 A block diagram of an adaptive prediction playout mechanism 
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Figure 6 continued 



